optimization method
Regularized Nonlinear Acceleration
We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.
SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques
SEBOOST applies a secondary optimization process in the subspace spanned by the last steps and descent directions. The method was inspired by the SESOP optimization method for large-scale problems, and has been adapted for the stochastic learning framework. It can be applied on top of any existing optimization method with no need to tweak the internal algorithm. We show that the method is able to boost the performance of different algorithms, and make them more robust to changes in their hyper-parameters. As the boosting steps of SEBOOST are applied between large sets of descent steps, the additional subspace optimization hardly increases the overall computational burden. We introduce two hyper-parameters that control the balance between the baseline method and the secondary optimization process. The method was evaluated on several deep learning tasks, demonstrating promising results.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > Canada (0.04)
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > Singapore (0.05)
- North America > Canada (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
GloballyConvergentNewtonMethodsfor Ill-conditionedGeneralizedSelf-concordantLosses
Second, in the non-parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nyström projection techniques, andprovethatitachievesoptimal generalization bounds with atime complexity of orderO(ndfλ), a memory complexity of orderO(df2λ) and no dependence on the condition number, generalizing the results known for leastsquaresregression.Here nisthenumberofobservationsand dfλ istheassociated degrees of freedom.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)